Dataset Description

This dataset is sourced from the SEER colorectal cancer database. https://seer.cancer.gov/data/

The dataset includes 5253 colorectal cancer cases diagnosed in Georgia.

Data were collected through hospital reports and state cancer registries following standardized reporting guidelines.

The study population comprises young adult males and females across all racial groups residing in Georgia aged 18-50.

The cases span the time period from January 1, 2000, to December 31, 2010.

Interactive Scatter Plot: Population vs Incidence Rate.


This scatter plot shows the relationship between population size and colorectal cancer incidence rate in Georgia counties (2000–2010).

Key Takeaways

Interactive Choropleth Map: County-Level Incidence.

Reading layer `tl_2010_13_county10' from data source 
  `E:\Project\Data\tl_2010_13_county10.shp' using driver `ESRI Shapefile'
Simple feature collection with 159 features and 18 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -85.60516 ymin: 30.35576 xmax: -80.75143 ymax: 35.00066
Geodetic CRS:  NAD83

This map shows county-level colorectal cancer incidence rates (2000–2010) in Georgia. The color intensity indicates the rate per 100,000 people.

Key Takeaways

Real-world impact

This dashboard helps identify Georgia counties with unusually high colorectal cancer incidence rates, guiding targeted public health action.

It provides a visual tool to support resource allocation, community outreach, and the investigation of geographic health disparities.

---
title: "Colorectal Cancer Incidence Dashboard"
output:
  flexdashboard::flex_dashboard:
    storyboard: true
    social: menu
    source: embed
---

```{r setup, include=FALSE}
library(flexdashboard)
library(dplyr)
library(plotly)
library(leaflet)
library(sf)
library(htmltools)
library(htmlwidgets)
```



### Dataset Description

<div style="font-size:18px;">

This dataset is sourced from the SEER colorectal cancer database.  https://seer.cancer.gov/data/

The dataset includes 5253 colorectal cancer cases diagnosed in Georgia.

Data were collected through hospital reports and state cancer registries following standardized reporting guidelines.  

The study population comprises young adult males and females across all racial groups residing in Georgia aged 18-50.  

The cases span the time period from January 1, 2000, to December 31, 2010.

</div>

### Interactive Scatter Plot: Population vs Incidence Rate.

```{r}
data_file <- "E:/Project/Data/processed_filtered_male_2000.txt"
if (!file.exists(data_file)) stop("Data file not found.")

incidence_raw <- read.table(data_file, header = TRUE, sep = "\t",
                            stringsAsFactors = FALSE, fileEncoding = "UTF-8")
incidence_raw$Population <- as.numeric(gsub(",", "", incidence_raw$Population))
incidence_raw <- incidence_raw[!is.na(incidence_raw$Population) & incidence_raw$Population > 0, ]

incidence_data <- incidence_raw %>%
  group_by(GEOID, county_name) %>%
  summarise(
    Total_Pop = sum(Population),
    Total_Count = sum(Count),
    .groups = "drop"
  ) %>%
  mutate(
    Incidence_Rate = (Total_Count / Total_Pop) * 100000)
```

```{r}
plot_ly(
  incidence_data,
  x = ~Total_Pop,
  y = ~Incidence_Rate,
  text = ~paste0(
    "County: ", county_name, "<br>",
    "Population: ", formatC(Total_Pop, format = "d", big.mark = ","), "<br>",
    "Incidence Rate: ", sprintf("%.2f", Incidence_Rate), " per 100K"
  ),
  type = 'scatter',
  mode = 'markers',
  marker = list(size = 10, color = 'blue'),
  hoverinfo = "text"
) %>%
  layout(
    title = list(text = "Incidence Rate vs Population", y = 0.95),
    autosize = TRUE,
    margin = list(t = 60, b = 100, l = 60, r = 40),
    xaxis = list(title = "Total Population"),
    yaxis = list(title = "Incidence Rate per 100,000")
  ) %>%
  config(responsive = TRUE)
```

***

This scatter plot shows the relationship between population size and colorectal cancer incidence rate in Georgia counties (2000–2010).

Key Takeaways

- The majority of counties have populations under 500,000 and incidence rates between 5 and 20 per 100,000.

- Larger counties show less variability in incidence rates, clustering around consistent values.

- Counties with smaller populations tend to have greater variation, including outliers with very high rates, possibly due to data instability in low-population areas.


### Interactive Choropleth Map: County-Level Incidence.

```{r}
incidence_data$GEOID <- as.character(incidence_data$GEOID)

shp_file <- "E:/Project/Data/tl_2010_13_county10.shp"
if (!file.exists(shp_file)) stop("Shapefile not found.")

georgia_counties <- st_read(shp_file)
georgia_counties$GEOID10 <- as.character(georgia_counties$GEOID10)

g_map_data <- left_join(georgia_counties, incidence_data, by = c("GEOID10" = "GEOID"))
g_map_data <- g_map_data[!is.na(g_map_data$Incidence_Rate), ]

labels <- sprintf(
  "<strong>County:</strong> %s<br/>
   <strong>Total Population:</strong> %s<br/>
   <strong>Incidence Rate:</strong> %.2f per 100K",
  g_map_data$county_name,
  formatC(g_map_data$Total_Pop, format = "d", big.mark = ","),
  g_map_data$Incidence_Rate
) %>% lapply(htmltools::HTML)

pal <- colorNumeric("Blues", domain = g_map_data$Incidence_Rate, na.color = "transparent")
```

```{r}
leaflet(g_map_data, options = leafletOptions(zoomControlPosition = "bottomright")) %>%
  addProviderTiles("CartoDB.Positron") %>%
  addPolygons(
    fillColor = ~pal(Incidence_Rate),
    weight = 1,
    opacity = 1,
    color = "white",
    dashArray = "3",
    fillOpacity = 0.7,
    highlight = highlightOptions(
      weight = 3,
      color = "#666",
      dashArray = "",
      fillOpacity = 0.7,
      bringToFront = TRUE
    ),
    label = labels,
    labelOptions = labelOptions(
      style = list("font-weight" = "normal", padding = "3px 8px"),
      textsize = "13px",
      direction = "auto"
    )
  ) %>%
  addLegend(
    pal = pal,
    values = ~Incidence_Rate,
    opacity = 0.7,
    title = "Incidence Rate<br>(per 100,000)",
    position = "topright"
  ) %>%
  addControl(
    html = "<div style='font-size:20px;'><strong>Colorectal Cancer Incidence in Georgia</strong></div>",
    position = "bottomleft"
  )
```

***

This map shows county-level colorectal cancer incidence rates (2000–2010) in Georgia. The color intensity indicates the rate per 100,000 people.

Key Takeaways

- Colorectal cancer incidence is not evenly distributed across Georgia’s counties.

- Several counties in central and southwestern Georgia display noticeably higher incidence rates.

- The spatial distribution suggests that localized factors may influence colorectal cancer risk and deserve further investigation.


### Real-world impact

<div style="font-size:18px;">

This dashboard helps identify Georgia counties with unusually high colorectal cancer incidence rates, guiding targeted public health action.  

It provides a visual tool to support resource allocation, community outreach, and the investigation of geographic health disparities.

</div>

### Link to github repository

<div style="font-size:18px;">

The full source code is embedded in this document and available for review. The visualizations are generated using plotly and tmap, ensuring interactivity and usability.

🔗 View project source code on <a href="https://github.com/rocinante0v0/Interactive-Cancer-Incidence-Visualizations" target="_blank">GitHub</a>

</div>